2,225 research outputs found
Design diversity: an update from research on reliability modelling
Diversity between redundant subsystems is, in various forms, a common design approach for improving system dependability. Its value in the case of software-based systems is still controversial. This paper gives an overview of reliability modelling work we carried out in recent projects on design diversity, presented in the context of previous knowledge and practice. These results provide additional insight for decisions in applying diversity and in assessing diverseredundant systems. A general observation is that, just as diversity is a very general design approach, the models of diversity can help conceptual understanding of a range of different situations. We summarise results in the general modelling of common-mode failure, in inference from observed failure data, and in decision-making for diversity in development.
The problems of assessing software reliability ...When you really need to depend on it
This paper looks at the ways in which the reliability of software can be assessed and predicted. It shows that the levels of reliability that can be claimed with scientific justification are relatively modest
Recommended from our members
Evaluation of software dependability
It has been said that the term software engineering is an aspiration not a description. We would like to be able to claim that we engineer software, in the same sense that we engineer an aero-engine, but most of us would agree that this is not currently an accurate description of our activities. My suspicion is that it never will be.
From the point of view of this essay – i.e. dependability evaluation – a major difference between software and other engineering artefacts is that the former is pure design. Its unreliability is always the result of design faults, which in turn arise as a result of human intellectual failures. The unreliability of hardware systems, on the other hand, has tended until recently to be dominated by random physical failures of components – the consequences of the ‘perversity of nature’. Reliability theories have been developed over the years which have successfully allowed systems to be built to high reliability requirements, and the final system reliability to be evaluated accurately. Even for pure hardware systems, without software, however, the very success of these theories has more recently highlighted the importance of design faults in determining the overall reliability of the final product. The conventional hardware reliability theory does not address this problem at all.
In the case of software, there is no physical source of failures, and so none of the reliability theory developed for hardware is relevant. We need new theories that will allow us to achieve required dependability levels, and to evaluate the actual dependability that has been achieved, when the sources of the faults that ultimately result in failure are human intellectual failures
Recommended from our members
The use of proofs in diversity arguments
The limits to the reliability that can be claimed for a design-diverse fault-tolerant system are mainly determined by the dependence that must be expected in the failure behaviours of the different versions: claims for independence between version failure processes are not believable. In this note we examine a different approach, in which a simple secondary system is used as a back-up to a more complex primary. The secondary system is sufficiently simple that claims for its perfection (with respect to design faults) are possible, but there is not complete certainty about such perfection. It is shown that assessment of the reliability of the overall fault-tolerant system in this case may take advantage of claims for independence that are more plausible than those involved in design diversity
The impact of diversity upon common mode failures
Recent models for the failure behaviour of systems involving redundancy and diversity have shown that common mode failures can be accounted for in terms of the variability of the failure probability of components over operational environments. Whenever such variability is present, we can expect that the overall system reliability will be less than we could have expected if the components could have been assumed to fail independently. We generalise a model of hardware redundancy due to Hughes [Hughes 1987], and show that with forced diversity, this unwelcome result no longer applies: in fact it becomes theoretically possible to do better than would be the case under independence of failures
Recommended from our members
Software fault-freeness and reliability predictions
Many software development practices aim at ensuring that software is correct, or fault-free. In safety critical applications, requirements are in terms of probabilities of certain behaviours, e.g. as associated to the Safety Integrity Levels of IEC 61508. The two forms of reasoning - about evidence of correctness and about probabilities of certain failures -are rarely brought together explicitly. The desirability of using claims of correctness has been argued by many authors, but not been taken up in practice. We address how to combine evidence concerning probability of failure together with evidence pertaining to likelihood of fault-freeness, in a Bayesian framework. We present novel results to make this approach practical, by guaranteeing reliability predictions that are conservative (err on the side of pessimism), despite the difficulty of stating prior probability distributions for reliability parameters. This approach seems suitable for practical application to assessment of certain classes of safety critical systems
Reasoning about the Reliability of Diverse Two-Channel Systems in which One Channel is "Possibly Perfect"
This paper considers the problem of reasoning about the reliability of fault-tolerant systems with two "channels" (i.e., components) of which one, A, supports only a claim of reliability, while the other, B, by virtue of extreme simplicity and extensive analysis, supports a plausible claim of "perfection." We begin with the case where either channel can bring the system to a safe state. We show that, conditional upon knowing pA (the probability that A fails on a randomly selected demand) and pB (the probability that channel B is imperfect), a conservative bound on the probability that the system fails on a randomly selected demand is simply pA.pB. That is, there is conditional independence between the events "A fails" and "B is imperfect." The second step of the reasoning involves epistemic uncertainty about (pA, pB) and we show that under quite plausible assumptions, a conservative bound on system pfd can be constructed from point estimates for just three parameters. We discuss the feasibility of establishing credible estimates for these parameters. We extend our analysis from faults of omission to those of commission, and then combine these to yield an analysis for monitored architectures of a kind proposed for aircraft
Software reliability and dependability: a roadmap
Shifting the focus from software reliability to user-centred measures of dependability in complete software-based systems. Influencing design practice to facilitate dependability assessment. Propagating awareness of dependability issues and the use of existing, useful methods. Injecting some rigour in the use of process-related evidence for dependability assessment. Better understanding issues of diversity and variation as drivers of dependability. Bev Littlewood is founder-Director of the Centre for Software Reliability, and Professor of Software Engineering at City University, London. Prof Littlewood has worked for many years on problems associated with the modelling and evaluation of the dependability of software-based systems; he has published many papers in international journals and conference proceedings and has edited several books. Much of this work has been carried out in collaborative projects, including the successful EC-funded projects SHIP, PDCS, PDCS2, DeVa. He has been employed as a consultant t
Some conservative stopping rules for the operational testing of safety-critical software
Operational testing, which aims to generate sequences of test cases with the same statistical properties as those that would be experienced in real operational use, can be used to obtain quantitative measures of the reliability of software. In the case of safety critical software it is common to demand that all known faults are removed. This means that if there is a failure during the operational testing, the offending fault must be identified and removed. Thus an operational test for safety critical software takes the form of a specified number of test cases (or a specified period of working) that must be executed failure-free. This paper addresses the problem of specifying the numbers of test cases (or time periods) required for a test, when the previous test has terminated as a result of a failure. It has been proposed that, after the obligatory fix of the offending fault, the software should be treated as if it were completely novel, and be required to pass exactly the same test as originally specified. The reasoning here claims to be conservative, inasmuch as no credit is given for any previous failure-free operation prior to the failure that terminated the test. We show that, in fact, this is not a conservative approach in all cases, and propose instead some new Bayesian stopping rules. We show that the degree of conservatism in stopping rules depends upon the precise way in which the reliability requirement is expressed. We define a particular form of conservatism that seems desirable on intuitive grounds, and show that the stopping rules that exhibit this conservatism are also precisely the ones that seem preferable on other grounds
Reasoning About the Reliability of Multi-version, Diverse Real-Time Systems
This paper is concerned with the development of reliable real-time systems for use in high integrity applications. It advocates the use of diverse replicated channels, but does not require the dependencies between the channels to be evaluated. Rather it develops and extends the approach of Little wood and Rush by (for general systems) by investigating a two channel system in which one channel, A, is produced to a high level of reliability (i.e. has a very low failure rate), while the other, B, employs various forms of static analysis to sustain an argument that it is perfect (i.e. it will never miss a deadline). The first channel is fully functional, the second contains a more restricted computational model and contains only the critical computations. Potential dependencies between the channels (and their verification) are evaluated in terms of aleatory and epistemic uncertainty. At the aleatory level the events ''A fails" and ''B is imperfect" are independent. Moreover, unlike the general case, independence at the epistemic level is also proposed for common forms of implementation and analysis for real-time systems and their temporal requirements (deadlines). As a result, a systematic approach is advocated that can be applied in a real engineering context to produce highly reliable real-time systems, and to support numerical claims about the level of reliability achieved
- …